{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exploring Spark SQL and CSV data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Getting started###"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Spark\n",
    "* Requires [Apache Spark](https://spark.apache.org/) version 1.3 or above.\n",
    "* We will mainly be looking at aggregation with Spark SQL and the new [Spark SQL DataFrame API](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame)\n",
    "* This offers a number of features similar to those provded by data frames in R or Pandas.\n",
    "\n",
    "#### Sample data\n",
    "* This exercise uses a sample CSV file of the latest monthly \"price paid data\" for house sales from the UK Land Registry.\n",
    "* This sample file is in the /data directory.\n",
    "* You can [download the latest version of the monthly PPD CSV file](http://publicdata.landregistry.gov.uk/market-trend-data/price-paid-data/a/pp-monthly-update-new-version.csv) (usually around 14MB) directly from the Land Registry website. \n",
    "* Other data formats are [also available for download](http://data.gov.uk/dataset/land-registry-monthly-price-paid-data).\n",
    "* The CSV files do not contain the column definitions, but [these are defined on the Land Registry website](https://www.gov.uk/about-the-price-paid-data)\n",
    "\n",
    "#### Data licence\n",
    "* The UK Land Registry data is [published under specific licence conditions](https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads#when-using-or-publishing-our-price-paid-data)\n",
    "* Attribution for UK Land Registry data:  Data produced by Land Registry © Crown copyright 2015."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Read CSV data into Spark and create data frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from pyspark.sql import SQLContext\n",
    "from pyspark import SparkConf\n",
    "\n",
    "### Seems like PySpark SparkContext is already available in Spark 1.3.0 ###\n",
    "#conf = SparkConf().setMaster(\"local[*]\")\n",
    "#sc = SparkContext(conf)\n",
    "\n",
    "sqlContext = SQLContext(sc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Load CSV file as an RDD and convert to a DataFrame\n",
    "\n",
    "* Spark SQL does not currently provide a built-in function to load CSV, so we do this manually using sc.textFile().\n",
    "* This creates an RDD, as usual.\n",
    "* We then map the data rows to tuples ready for converting to a DataFrame.\n",
    "* We also provide a set of field (column) definitions that will define the schema.\n",
    "* These data rows and column definitions are then combined to create a DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Need to import StructField etc\n",
    "from pyspark.sql.types import *\n",
    "\n",
    "# Load a CSV file and convert each line to a tuple.\n",
    "lines = sc.textFile(\"data/ppd-monthly-sample.csv\")\n",
    "data = lines.map(lambda l: l.split(\",\"))\n",
    "data_tuples = data.map(lambda p: (p[0].strip(\"\\\"\"), float(p[1].strip(\"\\\"\")), p[2].strip(\"\\\"\"), p[3].strip(\"\\\"\") \\\n",
    ", p[4].strip(\"\\\"\"), p[5].strip(\"\\\"\"), p[6].strip(\"\\\"\"), p[7].strip(\"\\\"\"), p[8].strip(\"\\\"\"), p[9].strip(\"\\\"\")\\\n",
    ", p[10].strip(\"\\\"\"), p[11].strip(\"\\\"\"), p[12].strip(\"\\\"\"), p[13].strip(\"\\\"\"), p[14].strip(\"\\\"\")))\n",
    "\n",
    "# Construct the field definitions for the schema\n",
    "# - most fields will be StringType here\n",
    "# - the sale price will be a DoubleType\n",
    "fields = []\n",
    "fields.append(StructField(\"transaction_uid\", StringType(), True))\n",
    "fields.append(StructField(\"sale_price\", DoubleType(), True))\n",
    "fields.append(StructField(\"date_of_transfer\", StringType(), True))\n",
    "fields.append(StructField(\"postcode\", StringType(), True))\n",
    "fields.append(StructField(\"property_type\", StringType(), True))\n",
    "fields.append(StructField(\"old_new\", StringType(), True))\n",
    "fields.append(StructField(\"duration\", StringType(), True))\n",
    "fields.append(StructField(\"paon\", StringType(), True))\n",
    "fields.append(StructField(\"saon\", StringType(), True))\n",
    "fields.append(StructField(\"street\", StringType(), True))\n",
    "fields.append(StructField(\"locality\", StringType(), True))\n",
    "fields.append(StructField(\"city\", StringType(), True))\n",
    "fields.append(StructField(\"district\", StringType(), True))\n",
    "fields.append(StructField(\"county\", StringType(), True))\n",
    "fields.append(StructField(\"record_status\", StringType(), True))\n",
    "\n",
    "schema = StructType(fields)\n",
    "\n",
    "# Apply the schema to the RDD of tuples:\n",
    "salesDf = sqlContext.createDataFrame(data_tuples, schema)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Display the schema of the DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "root\n",
      " |-- transaction_uid: string (nullable = true)\n",
      " |-- sale_price: double (nullable = true)\n",
      " |-- date_of_transfer: string (nullable = true)\n",
      " |-- postcode: string (nullable = true)\n",
      " |-- property_type: string (nullable = true)\n",
      " |-- old_new: string (nullable = true)\n",
      " |-- duration: string (nullable = true)\n",
      " |-- paon: string (nullable = true)\n",
      " |-- saon: string (nullable = true)\n",
      " |-- street: string (nullable = true)\n",
      " |-- locality: string (nullable = true)\n",
      " |-- city: string (nullable = true)\n",
      " |-- district: string (nullable = true)\n",
      " |-- county: string (nullable = true)\n",
      " |-- record_status: string (nullable = true)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "salesDf.printSchema()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Explore data frame with SQL\n",
    "\n",
    "#### Use SQL to derive summary data\n",
    "* We will fetch total number of sales, value of sales and average sale price by postcode.\n",
    "* We will then use different approaches to derive the top N postcodes by various criteria\n",
    "\n",
    "#### Remember that each of these operations will cause the RDD data to be materialised\n",
    "* We will cache the initial data table and the aggregate data (sales_by_postcode).\n",
    "* This should improve performance when we re-query the postcode data below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Register the DataFrame as a table.\n",
    "salesDf.registerTempTable(\"sales\")\n",
    "\n",
    "# Cache it so we can run multiple queries\n",
    "sqlContext.cacheTable(\"sales\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Use SQL to derive aggregate data from the DataFrame table\n",
    "* SQL can be run over DataFrames that have been registered as a table.\n",
    "* Bear in mind that the SQL is not parsed or validated until runtime.\n",
    "* For example, fetch total sales value, count and average sale price by postcode:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Fetch total sales value, count and average sale price by postcode:\n",
    "#\n",
    "# SELECT postcode, \n",
    "#        COUNT(*) AS num_sales,\n",
    "#        SUM(sale_price)  AS value_sales, \n",
    "#        AVG(sale_price) AS avg_price \n",
    "# FROM   sales \n",
    "# WHERE  postcode IS NOT NULL \n",
    "# AND postcode != '' \n",
    "# GROUP BY postcode\n",
    "#\n",
    "sales_by_postcode = sqlContext.sql(\"SELECT postcode, COUNT(*) AS num_sales, SUM(sale_price)  AS value_sales, AVG(sale_price) AS avg_price FROM sales WHERE postcode IS NOT NULL AND postcode != '' GROUP BY postcode\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Cache the aggregate data frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DataFrame[postcode: string, num_sales: bigint, value_sales: double, avg_price: double]"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sales_by_postcode.cache()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Set the value of N for your Top N queries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* We will be looking for the top N postcodes using various query methods below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "n = 5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Top N postcodes by number of sales\n",
    "\n",
    "#### Using an RDD (the old-fashioned way\n",
    "\n",
    "* Numbers are rounded here for convenience."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "E2 0SZ: count = 70, total value = 22301624, average sale price:  318595\n",
      "N4 2GS: count = 67, total value = 31506180, average sale price:  470241\n",
      "E3 3SU: count = 52, total value = 18758584, average sale price:  360742\n",
      "SE3 9FJ: count = 45, total value = 14099895, average sale price:  313331\n",
      "E2 0FG: count = 45, total value = 22665000, average sale price:  503667\n"
     ]
    }
   ],
   "source": [
    "# Remember that in Python we need to explicitly request the RDD via .rdd\n",
    "topSalesByCount = sales_by_postcode.rdd.sortBy(lambda data: data[1], ascending=False).take(n)\n",
    "\n",
    "for spc in topSalesByCount:\n",
    "  print (\"{0}: count = {1}, total value = {2:7.0f}, average sale price: {3:7.0f}\".format(spc.postcode, spc.num_sales, spc.value_sales, spc.avg_price))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using the new DataFrame API\n",
    "\n",
    "* In this case we use the show() function rather than println() to display the chosen records directly.\n",
    "* Data values are left in the internal numeric format for now."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+--------+---------+-----------+------------------+\n",
      "|postcode|num_sales|value_sales|         avg_price|\n",
      "+--------+---------+-----------+------------------+\n",
      "|  E2 0SZ|       70|2.2301624E7| 318594.6285714286|\n",
      "|  N4 2GS|       67| 3.150618E7|470241.49253731343|\n",
      "|  E3 3SU|       52|1.8758584E7|          360742.0|\n",
      "|  E2 0FG|       45|   2.2665E7| 503666.6666666667|\n",
      "| SE3 9FJ|       45|1.4099895E7|          313331.0|\n",
      "+--------+---------+-----------+------------------+\n",
      "\n"
     ]
    }
   ],
   "source": [
    "sales_by_postcode.orderBy(sales_by_postcode.num_sales.desc()).limit(n).show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Top N postcodes by value of sales\n",
    "\n",
    "#### Using an RDD (the old-fashioned way\n",
    "\n",
    "* Numbers are rounded here for convenience."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SW1W 9AH: count = 8, total value = 36988375, average sale price: 4623547\n",
      "N2 0BE: count = 1, total value = 33700000, average sale price: 33700000\n",
      "WC2H 0DT: count = 8, total value = 31604200, average sale price: 3950525\n",
      "N4 2GS: count = 67, total value = 31506180, average sale price:  470241\n",
      "W14 8QA: count = 21, total value = 26459075, average sale price: 1259956\n"
     ]
    }
   ],
   "source": [
    "# Print the top N postcodes by value of sales\n",
    "# Remember that in Python we need to explicitly request the RDD via .rdd\n",
    "topSalesByValue = sales_by_postcode.rdd.sortBy(lambda data: data[2], ascending=False).take(n)\n",
    "\n",
    "for spc in topSalesByValue:\n",
    "  print (\"{0}: count = {1}, total value = {2:7.0f}, average sale price: {3:7.0f}\".format(spc.postcode, spc.num_sales, spc.value_sales, spc.avg_price))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using the new DataFrame API\n",
    "\n",
    "* Data values are left in the internal numeric format for now."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+--------+---------+-----------+------------------+\n",
      "|postcode|num_sales|value_sales|         avg_price|\n",
      "+--------+---------+-----------+------------------+\n",
      "|SW1W 9AH|        8|3.6988375E7|       4623546.875|\n",
      "|  N2 0BE|        1|     3.37E7|            3.37E7|\n",
      "|WC2H 0DT|        8|  3.16042E7|         3950525.0|\n",
      "|  N4 2GS|       67| 3.150618E7|470241.49253731343|\n",
      "| W14 8QA|       21|2.6459075E7|1259955.9523809524|\n",
      "+--------+---------+-----------+------------------+\n",
      "\n"
     ]
    }
   ],
   "source": [
    "sales_by_postcode.orderBy(sales_by_postcode.value_sales.desc()).limit(n).show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Top N postcodes by average sale price###\n",
    "\n",
    "#### Using an RDD (the old-fashioned way)####\n",
    "\n",
    "* Numbers are rounded here for convenience."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N2 0BE: count = 1, total value = 33700000, average sale price: 33700000\n",
      "SW10 9SJ: count = 1, total value = 16400000, average sale price: 16400000\n",
      "SW3 2EB: count = 1, total value = 14650000, average sale price: 14650000\n",
      "HP17 8EJ: count = 1, total value = 12850000, average sale price: 12850000\n",
      "W1J 5BJ: count = 1, total value = 12600000, average sale price: 12600000\n"
     ]
    }
   ],
   "source": [
    "# Remember that in Python we need to explicitly request the RDD via .rdd\n",
    "topSalesByAvgPrice = sales_by_postcode.rdd.sortBy(lambda data: data[3], ascending=False).take(n)\n",
    "\n",
    "for spc in topSalesByAvgPrice:\n",
    "  print (\"{0}: count = {1}, total value = {2:7.0f}, average sale price: {3:7.0f}\".format(spc.postcode, spc.num_sales, spc.value_sales, spc.avg_price))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using the new DataFrame API####\n",
    "\n",
    "* Data values are left in the internal numeric format for now."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+--------+---------+-----------+---------+\n",
      "|postcode|num_sales|value_sales|avg_price|\n",
      "+--------+---------+-----------+---------+\n",
      "|  N2 0BE|        1|     3.37E7|   3.37E7|\n",
      "|SW10 9SJ|        1|     1.64E7|   1.64E7|\n",
      "| SW3 2EB|        1|    1.465E7|  1.465E7|\n",
      "|HP17 8EJ|        1|    1.285E7|  1.285E7|\n",
      "| W1J 5BJ|        1|     1.26E7|   1.26E7|\n",
      "+--------+---------+-----------+---------+\n",
      "\n"
     ]
    }
   ],
   "source": [
    "sales_by_postcode.orderBy(sales_by_postcode.avg_price.desc()).limit(n).show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Do it all with the DataFrame API\n",
    "\n",
    "* The [Spark SQL DataFrame API](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame) also provides functionality allowing you to query and aggregate directly in the data frame without using any SQL.\n",
    "* Some of the groupBy and aggregation functions seem to be a bit clunky compared to SQL.\n",
    "* However, they seem to run more quickly than the RDD versions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Group data by postcode\n",
    "* First we group the data from the original data-frame (salesDf) by postcode, excluding any empty or NULL postcodes.\n",
    "* This is roughly equivalent to the SQL GROUP BY query we used above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Group the data by postcode\n",
    "group_by_postcode = salesDf.filter(\"postcode IS NOT NULL\").filter(\"postcode != ''\").groupBy(salesDf.postcode) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Count sales by postcode\n",
    "* Now we apply the count() to get the number of sales by postcode.\n",
    "* Then, we sort the counts in descending order with the desc() function (imported from pyspark.sql.functions).\n",
    "* The groupBy() function allows us to get at the count directly.\n",
    "* Other aggregation functions require a bit more work (see below)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+--------+-----+\n",
      "|postcode|count|\n",
      "+--------+-----+\n",
      "|  E2 0SZ|   70|\n",
      "|  N4 2GS|   67|\n",
      "|  E3 3SU|   52|\n",
      "|  E2 0FG|   45|\n",
      "| SE3 9FJ|   45|\n",
      "+--------+-----+\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Need to use some extra functions for sorting etc\n",
    "from pyspark.sql import functions as F\n",
    "\n",
    "num_sales_by_postcode = group_by_postcode.count().orderBy(F.desc(\"count\"))\n",
    "num_sales_by_postcode.limit(n).show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Top N postcodes by value of sales\n",
    "* The Spark SQL agg() functions seem to generate artificial column names at runtime e.g. \"SUM(sale_price#1033)\".\n",
    "* However, it does not seem possible to apply an alias() function to this column to get a sensible name.\n",
    "* But when we apply the orderBy() function, we need to know the name of the column to sort by.\n",
    "* So in this example we get the name of the aggregate column from the data frame at runtime.\n",
    "* Not sure if there is a better way to do this, but it works for now."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+--------+---------------+\n",
      "|postcode|sum(sale_price)|\n",
      "+--------+---------------+\n",
      "|SW1W 9AH|    3.6988375E7|\n",
      "|  N2 0BE|         3.37E7|\n",
      "|WC2H 0DT|      3.16042E7|\n",
      "|  N4 2GS|     3.150618E7|\n",
      "| W14 8QA|    2.6459075E7|\n",
      "+--------+---------------+\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Get the sum of sales prices from the grouped data\n",
    "sales_value_by_postcode = group_by_postcode.sum(\"sale_price\") \n",
    "# Get the generated name of the SUM column so we can sort by it\n",
    "agg_col = (sales_value_by_postcode.columns)[1]\n",
    "# Apply the sort and take the N highest records\n",
    "sales_value_by_postcode.orderBy(F.desc(agg_col)).limit(n).show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Top N postcodes by average sale price\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+--------+---------------+\n",
      "|postcode|avg(sale_price)|\n",
      "+--------+---------------+\n",
      "|  N2 0BE|         3.37E7|\n",
      "|SW10 9SJ|         1.64E7|\n",
      "| SW3 2EB|        1.465E7|\n",
      "|HP17 8EJ|        1.285E7|\n",
      "| W1J 5BJ|         1.26E7|\n",
      "+--------+---------------+\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Get the average of sales prices from the grouped data\n",
    "avg_price_by_postcode = group_by_postcode.avg(\"sale_price\") \n",
    "# Get the generated name of the average column so we can sort by it\n",
    "agg_col = (avg_price_by_postcode.columns)[1]\n",
    "# Apply the sort and take the N highest records\n",
    "avg_price_by_postcode.orderBy(F.desc(agg_col)).limit(n).show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Cleanup\n",
    "* Clear out the cached data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "sqlContext.clearCache()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conclusions\n",
    "\n",
    "* The Spark SQL API already offers some useful tools for applying SQL queries to RDD data.\n",
    "* We have only taken a brief look at the new DataFrame API here, but so far it looks even more useful.\n",
    "* The data frame aggregation functions are slightly awkward, but seem to perform well.\n",
    "* There seems to be plenty more functionality to explore in this rapidly growing corner of the Apache Spark project."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
